首页> 外文OA文献 >Apprentissage de modèles de mélange à large échelle par Sketching
【2h】

Apprentissage de modèles de mélange à large échelle par Sketching

机译:通过素描学习大型混合模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Learning parameters from voluminous data can be prohibitive in terms of memory and computational requirements. Furthermore, new challenges arise from modern database architectures, such as the requirements for learning methods to be amenable to streaming, parallel and distributed computing. In this context, an increasingly popular approach is to first compress the database into a representation called a linear sketch, that satisfies all the mentioned requirements, then learn the desired information using only this sketch, which can be significantly faster than using the full data if the sketch is small. In this thesis, we introduce a generic methodology to fit a mixture of probability distributions on the data, using only a sketch of the database. The sketch is defined by combining two notions from the reproducing kernel literature, namely kernel mean embedding and Random Features expansions. It is seen to correspond to linear measurements of the underlying probability distribution of the data, and the estimation problem is thus analyzed under the lens of Compressive Sensing (CS), in which a (traditionally finite-dimensional) signal is randomly measured and recovered. We extend CS results to our infinite-dimensional framework, give generic conditions for successful estimation and apply them analysis to many problems, with a focus on mixture models estimation. We base our method on the construction of random sketching operators such that some Restricted Isometry Property (RIP) condition holds in the Banach space of finite signed measures with high probability. In a second part we introduce a flexible heuristic greedy algorithm to estimate mixture models from a sketch. We apply it on synthetic and real data on three problems: the estimation of centroids from a sketch, for which it is seen to be significantly faster than k-means, Gaussian Mixture Model estimation, for which it is more efficient than Expectation-Maximization, and the estimation of mixtures of multivariate stable distributions, for which, to our knowledge, it is the only algorithm capable of performing such a task.
机译:就存储和计算需求而言,从大量数据中学习参数可能会令人望而却步。此外,现代数据库体系结构提出了新的挑战,例如对学习方法的要求使其适合流,并行和分布式计算。在这种情况下,一种越来越流行的方法是首先将数据库压缩为一个称为线性草图的表示,该表达式可以满足所有提到的要求,然后仅使用此草图学习所需的信息,这比使用完整数据要快得多。草图很小。在本文中,我们介绍了一种通用方法,仅使用数据库草图即可拟合数据中概率分布的混合。草图是通过结合来自可复制内核文献的两个概念来定义的,即内核均值嵌入和随机特征展开。可以看出它对应于数据的潜在概率分布的线性测量,因此在压缩感测(CS)的分析下分析了估计问题,其中随机地测量并恢复了(传统有限维)信号。我们将CS结果扩展到我们的无穷维框架,为成功进行估计提供通用条件,并将其分析应用于许多问题,重点放在混合模型估计上。我们基于随机素描算子的构造建立我们的方法,以使某些受限等距特性(RIP)条件以高概率存在于有限有符号测度的Banach空间中。在第二部分中,我们介绍了一种灵活的启发式贪婪算法,用于从草图中估计混合模型。我们将其应用于关于以下三个问题的合成和真实数据:从草图进行质心的估计,其速度明显快于k均值;高斯混合模型估计,其效率比Expectation-Maximization更高,以及多元稳定分布的混合估计,据我们所知,这是唯一能够执行此类任务的算法。

著录项

  • 作者

    Keriven, Nicolas;

  • 作者单位
  • 年度 2017
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号